Bitmap Indexing-based Clustering and Retrieval of XML Documents

نویسندگان

  • Jong P. Yoon
  • Vijay Raghavan
  • Venu Chakilam
چکیده

This paper describes a bitmap indexing based technique to cluster XML documents. XML documents can be hierarchically represented by elements. To improve performance of information retrieval, documents can be indexed using bitmap techniques. Such a bitmap index is sparse, meaning it contains unnecessarily many zero bits, especially for the word dimension. To remove zero bits and improve the performance of information retrieval, we propose to generate several small bitmap indexes that are not sparse. Using the similarity and popularity operations available in bitmap indexes, three clustering techniques are discussed: topdown clustering, bottom-up clustering, and mixed clustering. Experimental results are also shown in this paper.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

BitCube: Clustering and Statistical Analysis for XML Documents

In this paper, we describe a new bitmap indexing technique to cluster XML documents. XML is a new standard for exchanging and representing information on the Internet. Documents can be hierarchically represented by XML-elements. XML documents are represented and indexed using a bitmap indexing technique. We define the similarity and popularity operations available in bitmap indexes and propose ...

متن کامل

خوشه‌بندی فراابتکاری اسناد فارسی اِکس‌اِم‌اِل مبتنی بر شباهت ساختاری و محتوایی

Due to the increasing number of documents, XML, effectively organize these documents in order to retrieve useful information from them is essential. A possible solution is performed on the clustering of XML documents in order to discover knowledge. Clustering XML documents is a key issue of how to measure the similarity between XML documents. Conventional clustering of text documents using a do...

متن کامل

Prototyping a Vibrato-Aware Query-By-Humming (QBH) Music Information Retrieval System for Mobile Communication Devices: Case of Chromatic Harmonica

Background and Aim: The current research aims at prototyping query-by-humming music information retrieval systems for smart phones. Methods: This multi-method research follows simulation technique from mixed models of the operations research methodology, and the documentary research method, simultaneously. Two chromatic harmonica albums comprised the research population. To achieve the purpose ...

متن کامل

Semantic and Structure Based XML Similarity: The XS3 Prototype

Due to the ever-increasing web availability of XML-based data, an efficient approach to compare XML documents becomes crucial in information retrieval. Such comparison of XML documents has applications in version control (finding, scoring and browsing changes between different versions of a document), change management and data warehousing (support of temporal queries and index maintenance) [3,...

متن کامل

A methodology for indexing and retrieval of information from XML document

The XML documents having markup elements are increasing vividly on the World Wide Web. Now the exigency is that how these documents could be used for the welfare of our posterity so that indexing and retrieving of these documents can be made more accurate and precise. The endeavors to make the standards for indexing and retrieving of XML documents are burgeoning. Currently the structured docume...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2001